Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename Sparkey hash* methods to avoid shadowing regular methods #3894

Merged
merged 4 commits into from
Aug 31, 2021

Conversation

devictr
Copy link
Contributor

@devictr devictr commented Jul 2, 2021

The hashJoin, hashLeftOuterJoin, hashFullOuterJoin, hashIntersectByKey, hashSubtractByKey methods in com.spotify.scio.extra.sparkey.PairLargeHashSCollectionFunctions were shadowing the corresponding methods in com.spotify.scio.values.PairHashSCollectionFunctions.

As an example, the following test doesn't compile:

  it should "not shadow the regular hashJoin method" in {
    import com.spotify.scio.extra.sparkey._

    val sc = ScioContext()

    val lhsInput = Seq((1, "a"), (2, "c"), (3, "e"), (4, "g"))
    val rhsInput = Seq((1, "b"), (2, "d"), (3, "f"))

    val rhs = sc.parallelize(rhsInput)
    val lhs = sc.parallelize(lhsInput)

    val result = lhs
      .hashJoin(rhs)
      .materialize

    val scioResult = sc.run().waitUntilFinish()
    val expectedOutput = List((1,("a", "b")), (2,("c","d")), (3,("e","f")))

    scioResult.tap(result).value.toList should contain theSameElementsAs expectedOutput
}

This PR contains a breaking change that fixes this issue. I think renaming those (relatively new) methods to largeHash* makes more sense, since they all take an input like SideInput[SparkeySet[K]] or SideInput[SparkeyMap[K]]

Please let me know if this isn't an approach you want to consider and if you'd like to fix this another way.

Copy link
Member

@psobot psobot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with one comment - although I'm curious how @nevillelyh and/or @regadas feel about breaking API compatibility. (This is a brand new feature without much usage yet, so I'm personally okay with it here.)

@@ -30,6 +30,6 @@ class LargeHashSCollectionFunctions[T](private val self: SCollection[T]) {
*/
def hashFilter(sideInput: SideInput[SparkeySet[T]]): SCollection[T] = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this also change to largeHashFilter while we're at it? (There is a regular hashFilter as well, it seems.)

@psobot
Copy link
Member

psobot commented Jul 2, 2021

Also it looks like scalafmt is complaining - can you try running sbt scalafmtAll @devictr?

@codecov
Copy link

codecov bot commented Jul 2, 2021

Codecov Report

Merging #3894 (080fab0) into main (08b99b1) will decrease coverage by 0.05%.
The diff coverage is 84.61%.

❗ Current head 080fab0 differs from pull request most recent head 465996a. Consider uploading reports for the commit 465996a to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3894      +/-   ##
==========================================
- Coverage   62.07%   62.01%   -0.06%     
==========================================
  Files         286      286              
  Lines       10222    10196      -26     
  Branches      358      371      +13     
==========================================
- Hits         6345     6323      -22     
+ Misses       3877     3873       -4     
Impacted Files Coverage Δ
.../src/main/scala/com/spotify/scio/ScioContext.scala 82.12% <ø> (-0.76%) ⬇️
.../main/scala/com/spotify/scio/pubsub/PubsubIO.scala 34.09% <0.00%> (ø)
...in/scala/com/spotify/scio/values/SCollection.scala 94.04% <66.66%> (+0.63%) ⬆️
.../extra/sparkey/LargeHashSCollectionFunctions.scala 100.00% <100.00%> (ø)
...ra/sparkey/PairLargeHashSCollectionFunctions.scala 100.00% <100.00%> (ø)
...spotify/scio/pubsub/syntax/SCollectionSyntax.scala 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 08b99b1...465996a. Read the comment docs.

Copy link
Contributor

@regadas regadas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 we will release this under 0.11.0

@nevillelyh nevillelyh added this to the 0.11.0 milestone Aug 30, 2021
@regadas regadas force-pushed the victord/rename-sparkey-joins branch from 33d992b to 465996a Compare August 31, 2021 09:20
@regadas regadas added the bug Something isn't working label Aug 31, 2021
@regadas regadas merged commit bb29356 into spotify:main Aug 31, 2021
regadas added a commit to regadas/scio that referenced this pull request Sep 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants